AITopics | factual error

Collaborating Authors

factual error

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Benchmarking Foundation Models with Language-Model-as-an-Examiner Y ushi Bai

Neural Information Processing SystemsFeb-18-2026, 00:20:50 GMT

Our data and benchmarking results are available at: http://lmexam.xlore.cn.

examiner, large language model, machine learning, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.28)
Asia > China > Beijing > Beijing (0.04)
Asia > Singapore (0.04)

Genre: Research Report > New Finding (0.46)

Industry:

Health & Medicine > Therapeutic Area (1.00)
Health & Medicine > Public Health (0.97)
Information Technology (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

8b8a7960d343e023a6a0afe37eee6022-Paper-Datasets_and_Benchmarks.pdf

Neural Information Processing SystemsFeb-15-2026, 18:32:45 GMT

large language model, machine learning, reactor, (20 more...)

Neural Information Processing Systems

Country:

North America > United States (0.14)
Europe > France (0.05)
Europe > Russia (0.05)
(11 more...)

Genre: Research Report > New Finding (0.46)

Industry:

Education (0.93)
Energy > Power Industry > Utilities > Nuclear (0.48)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(2 more...)

Add feedback

df438caa36714f69277daa92d608dd63-Paper-Conference.pdf

Neural Information Processing SystemsFeb-12-2026, 09:31:42 GMT

arxiv preprint arxiv, factuality, knowledge, (13 more...)

Neural Information Processing Systems

Country:

Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
Asia > Middle East > Jordan (0.04)
North America > United States > Illinois (0.04)
(3 more...)

Genre: Research Report > New Finding (0.93)

Industry: Health & Medicine > Therapeutic Area > Oncology (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.47)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.46)

Add feedback

FELM: Benchmarking Factuality Evaluation of Large Language Models

Neural Information Processing SystemsDec-26-2025, 07:41:00 GMT

Assessing factuality of text generated by large language models (LLMs) is an emerging yet crucial research area, aimed at alerting users to potential errors and guiding the development of more reliable LLMs. Nonetheless, the evaluators assessing factuality necessitate suitable evaluation themselves to gauge progress and foster advancements. This direction remains under-explored, resulting in substantial impediments to the progress of factuality evaluators. To mitigate this issue, we introduce a benchmark for Factuality Evaluation of large Language Models, referred to as FELM. In this benchmark, we collect responses generated from LLMs and annotate factuality labels in a fine-grained manner. Contrary to previous studies that primarily concentrate on the factuality of world knowledge (e.g.

benchmarking factuality evaluation, felm, language model, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Multi-Modal Fact-Verification Framework for Reducing Hallucinations in Large Language Models

Patel, Piyushkumar

arXiv.org Artificial IntelligenceOct-28-2025

While Large Language Models have transformed how we interact with AI systems, they suffer from a critical flaw: they confidently generate false information that sounds entirely plausible. This hallucination problem has become a major barrier to deploying these models in real-world applications where accuracy matters. We developed a fact-verification framework that catches and corrects these errors in real-time by cross-checking LLM outputs against multiple knowledge sources. Our system combines structured databases, live web searches, and academic literature to verify factual claims as they're generated. When we detect inconsistencies, we automatically correct them while preserving the natural flow of the response. Testing across various domains showed we could reduce hallucinations by 67% without sacrificing response quality. Domain experts in healthcare, finance, and scientific research rated our corrected outputs 89% satisfactory--a significant improvement over unverified LLM responses. This work offers a practical solution for making LLMs more trustworthy in applications where getting facts wrong isn't an option.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2510.22751

Genre: Research Report (1.00)

Industry:

Information Technology > Security & Privacy (0.93)
Banking & Finance (0.93)
Health & Medicine (0.89)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Train for Truth, Keep the Skills: Binary Retrieval-Augmented Reward Mitigates Hallucinations

Chen, Tong, Asai, Akari, Zettlemoyer, Luke, Hajishirzi, Hannaneh, Brahman, Faeze

arXiv.org Artificial IntelligenceOct-21-2025

Language models often generate factually incorrect information unsupported by their training data, a phenomenon known as extrinsic hallucination. Existing mitigation approaches often degrade performance on open-ended generation and downstream tasks, limiting their practical utility. We propose an online reinforcement learning method using a novel binary retrieval-augmented reward (RAR) to address this tradeoff. Unlike continuous reward schemes, our approach assigns a reward of one only when the model's output is entirely factually correct, and zero otherwise. We evaluate our method on Qwen3 reasoning models across diverse tasks. For open-ended generation, binary RAR achieves a 39.3% reduction in hallucination rates, substantially outperforming both supervised training and continuous-reward RL baselines. In short-form question answering, the model learns calibrated abstention, strategically outputting "I don't know" when faced with insufficient parametric knowledge. This yields 44.4% and 21.7% fewer incorrect answers on PopQA and GPQA, respectively. Crucially, these factuality gains come without performance degradation on instruction following, math, or code, whereas continuous-reward RL, despite improving factuality, induces quality regressions.

large language model, machine learning, reinforcement learning, (22 more...)

arXiv.org Artificial Intelligence

2510.17733

Country:

North America > United States (1.00)
Asia (0.68)

Genre: Research Report (0.83)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.68)

Add feedback

f64e55d03e2fe61aa4114e49cb654acb-Paper-Datasets_and_Benchmarks.pdf

Neural Information Processing SystemsOct-9-2025, 12:03:05 GMT

examiner, large language model, machine learning, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.28)
Asia > China > Beijing > Beijing (0.04)
Asia > Singapore (0.04)

Genre: Research Report > New Finding (0.46)

Industry:

Health & Medicine > Therapeutic Area (1.00)
Health & Medicine > Public Health (0.97)
Information Technology (0.93)
Leisure & Entertainment (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

8b8a7960d343e023a6a0afe37eee6022-Paper-Datasets_and_Benchmarks.pdf

Neural Information Processing SystemsOct-9-2025, 00:50:06 GMT

large language model, machine learning, reactor, (20 more...)

Neural Information Processing Systems

Country:

North America > United States (0.14)
Europe > France (0.05)
Europe > Russia (0.05)
(11 more...)

Genre: Research Report > New Finding (0.46)

Industry:

Education (0.93)
Energy > Power Industry > Utilities > Nuclear (0.48)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(2 more...)

Add feedback

Factuality Enhanced Language Models for Open-Ended Text Generation

Neural Information Processing SystemsAug-19-2025, 12:23:49 GMT

Pretrained language models (LMs) are susceptible to generate text with nonfac-tual information. In this work, we measure and improve the factual accuracy of large-scale LMs for open-ended text generation.

arxiv preprint arxiv, large language model, natural language, (16 more...)

Neural Information Processing Systems

Country:

Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
Asia > Middle East > Jordan (0.04)
North America > United States > Illinois (0.04)
(3 more...)

Genre: Research Report > New Finding (0.93)

Industry: Health & Medicine > Therapeutic Area > Oncology (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.47)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.46)

Add feedback

Trustworthy Reasoning: Evaluating and Enhancing Factual Accuracy in LLM Intermediate Thought Processes

Jiao, Rui, Zhang, Yue, Li, Jinku

arXiv.org Artificial IntelligenceAug-5-2025

We present a novel framework addressing a critical vulnerability in Large Language Models (LLMs): the prevalence of factual inaccuracies within intermediate reasoning steps despite correct final answers. This phenomenon poses substantial risks in high-stakes domains including healthcare, legal analysis, and scientific research, where erroneous yet confidently presented reasoning can mislead users into dangerous decisions. Our framework integrates three core components: (1) a specialized fact-checking classifier trained on counterfactually augmented data to detect subtle factual inconsistencies within reasoning chains; (2) an enhanced Group Relative Policy Optimization (GRPO) reinforcement learning approach that balances factuality, coherence, and structural correctness through multi-dimensional rewards; and (3) a mechanistic interpretability method examining how factuality improvements manifest in model activations during reasoning processes. Extensive evaluation across multi state-of-the-art models reveals concerning patterns: even leading models like Claude-3.7 and GPT-o1 demonstrate reasoning factual accuracy of only 81.93% and 82.57% respectively. Our approach significantly enhances factual robustness (up to 49.90% improvement) while maintaining or improving performance on challenging benchmarks including Math-500, AIME-2024, and GPQA. Furthermore, our neural activation-level analysis provides actionable insights into how factual enhancements reshape reasoning trajectories within model architectures, establishing foundations for future training methodologies that explicitly target factual robustness through activation-guided optimization.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2507.2294

Country: